Skip to content

model routing: cost/latency ranking with ranked fallback list#849

Open
adilhafeez wants to merge 8 commits intomainfrom
adil/top-level-routing-preferences
Open

model routing: cost/latency ranking with ranked fallback list#849
adilhafeez wants to merge 8 commits intomainfrom
adil/top-level-routing-preferences

Conversation

@adilhafeez
Copy link
Copy Markdown
Contributor

@adilhafeez adilhafeez commented Mar 27, 2026

Summary

  • Top-level routing_preferences (v0.4.0+) with candidate model list and selection_policy
  • /routing/v1/* returns ranked models[] array; client uses models[0], falls back on 429/5xx
  • selection_policy.prefer: cheapest, fastest, random, none
  • model_metrics_sources: cost_metrics, prometheus_metrics, digitalocean_pricing (public DO catalog with model_aliases)
  • Startup errors for missing metric sources; startup + request-time warnings for unmatched models
  • Dropped legacy per-provider routing format
  • Demo updated to v0.4.0 with docker-compose (Prometheus + mock latency server)

fixes #848

- MetricsSource::DigitalOceanPricing variant: fetch public DO Gen-AI pricing, normalize as lowercase(creator)/model_id, cost = input + output per million
- cost_metrics endpoint format updated to { "model": { "input_per_million": X, "output_per_million": Y } }
- Startup errors: prefer:cheapest requires cost source, prefer:fastest requires prometheus
- Startup warning: models with no pricing/latency data ranked last
- One-per-type enforcement: digitalocean_pricing; error if cost_metrics + digitalocean_pricing both configured
- cost_snapshot() / latency_snapshot() on ModelMetricsService for startup checks
- Demo config updated to v0.4.0 top-level routing_preferences with cheapest + fastest policies
- docker-compose.yaml + prometheus.yaml + metrics_server.py for demo latency metrics
- Schema and docs updated
@adilhafeez adilhafeez changed the title add top-level routing_preferences with selection_policy and model metrics fetch model routing: cost/latency ranking with ranked fallback list Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: top-level routing_preferences with selection_policy and metrics fetch (v0.4.0)

1 participant